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ABSTRACT 

This paper introduces a novel non-Separable sPAtioteMpo- 
ral filter (non-SPAM) which enables the spatiotemporal de- 
composition of a still-image. The construction of this filter 
is inspired by the model of the retina which is able to selec- 
tively transmit information to the brain. The non-SPAM filter 
mimics the retinal-way to extract necessary information for 
a dynamic encoding/decoding system. We applied the non- 
SPAM filter on a still image which is flashed for a long time. 
We prove that the non-SPAM filter decomposes the still im- 
age over a set of time-varying difference of Gaussians, which 
form a frame. We simulate the analysis and synthesis sys- 
tem based on this frame. This system results in a progressive 
reconstruction of the input image. Both the theoretical and 
numerical results show that the quality of the reconstruction 
improves while the time increases. 

Index Terms — Bio-inspired processing, dynamic en- 
coder, non-separable spatiotemporal filter, frame, dual frame. 

1. INTRODUCTION 

During the last decades there has been a great progress in im- 
age and video compression standards which enables to deal 
with the current needs of coding and decoding High Def- 
inition (HD) and Ultra High Definition (UHD) signals [1], 
However, this progression rate seems to be very low compar- 
ing to the increasing rate of the amount of data which needs 
to be transmitted or stored. For instance, H.265/HEVC [2] 
was released in 2013, ten years later than the previous stan- 
dard H.264/AVC [3, 4], This lack of synchronization restricts 
their evolution which is currently based on parameterizations 
and/or improvements of the basic architectures instead of the 
proposition of groundbreaking approaches. 

In this paper we study an alternative compression and de- 
compression model, which is based on the behavior of the 
visual system. The way the luminance of light is captured, 
transformed and compressed by the inner part of the eye, the 
retina, seems to follow the basic principles of compression. 
The retinal function has been explicitly modeled by neuro- 
scientists and the experimental results have shown that this 
should be a very efficient “compression" model [5], This is 
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due to the fact that the retina is a layered structure of different 
kinds of cells. The amount of cells decreases while they are 
closer to the optic nerve. This structure succeeds in encoding 
the information in order to fit the limited capacity of the op- 
tic nerve. The signal which reaches the eyes is successfully 
transmitted to the brain despite this bottleneck [6, 7], 

Our goal is to study the retinal-inspired transformation 
from the signal processing point of view and to set the basis 
for our future bio-inspired dynamic codec. The first attempt 
in modeling this kind of filter was proposed in [8], This is 
a separable spatiotemporal filter structured as a Difference of 
Gaussians (DoG) pyramid based on [9, 10]. Each layer of 
this structure is delayed with an exponential temporal func- 
tion. We improve the filter by explicitly taking into account 
the time in the design of our non-Separable sPAtioteMporal 
(non-SPAM) filter. 

In coding/decoding systems, the transformation of the 
signal is extremely important because it results in a more 
suitable representation in terms of the number of informa- 
tive coefficients. There are different kinds of transformations 
like Discrete Fourier Transform (DFT) [11], Discrete Cosine 
Transform (DCT) [12] and the Discrete Wavelet Transform 
(DWT) [13, 14, 15, 16], which are currently used in most of 
the lossy conventional compression standards (i.e., JPEG and 
JPEG2000). In addition, there are other kinds of filters which 
have been built in order to serve not only image compression 
but the general aspects of image and video processing. For 
instance, the Gaussian and Laplacian pyramids [9] are scaled 
spatial filters which allow a progressive transmission of a 
signal. Other kinds of pyramids are for instance the oriented 
pyramids using Gabor functions [17] or the orthogonal pyra- 
mids [18], Many of these approaches taking into account 
the time has been extended into spatio-temporal filters which 
have been applied especially in video surveillance and object 
tracking techniques [19, 20]. 

In section 2 we aim to introduce the non-SPAM filter and 
explain its bio-inspired nature. We also applied this filter on 
a still image which is flashed for a long time. Then, we prove 
that the non-SPAM filter has a frame structure in section 3. 
In section 4, we propose a progressive reconstruction of the 
input signal which is numerically illustrated in section 5. In 
the last section, we conclude the paper with a discussion about 
the future work. 



2. NON-SPAM FILTER 
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The aim of this section is to introduce the non-SPAM filter 
and to study its behavior. This filter is inspired by the mecha- 
nism of photo-receptors and horizontals cells which lie inside 
the retina (the innest part of the eye). These cells act as edge 
detectors and at the same time as motion detectors due to the 
way they connect to each other. These are the features that 
the non-SPAM filter tries to mimic having a spatial behavior 
which varies with respect to time. The non-separability of 
space and time enables the filter to detect temporal variations 
of luminance even in a uniform spatial region. This is not 
the case for a separable spatiotemporal filter. We study the 
non-SPAM filter for the analysis and synthesis of an image 
f(x, t), where x £ R 2 and t £ R + , which is observed during 
a certain time interval. The spatiotemporal convolution of the 
non-SPAM and the input image results in the function A(x, t ) 
called the activation degree: 

A(x, t) = K(x, t) X * f(x, t) (1) 

X,t 

where * is the convolution with respect to space and time. 
We are going to define now the non-SPAM filter in continuous 
time and space as following: 

K(x,t) = C(x,t) — S(x,t), (2) 

where C(x, t ) and S(x, t) are the center and the surround spa- 
tiotemporal filters given by equations (3) and (4) respectively: 


C(x,t) = w c G„ c (x)T{t), (3) 

S(x,t) = w s G as (x) (t * E ts ) (t) , (4) 

where w c and w s are constant parameters, G I= and G aa are 
spatial Gaussian filters standing for the center and surround 
areas respectively, and E TS is an exponential temporal filter. 
The center temporal filter T(t) is given by: 


T(t) = E TG>n * (<5 0 - w c E tc ) (t), (5) 


where the gamma temporal filter E TG n {t) is defined by 


E T ,n(t) 


t n exp (— f/r) 

■j-n+l 


(6) 


where n £ N and r is a constant parameter (E T ^ n {t) = 0 for 
t < 0), E tc is an exponential temporal filter. So is the dirac 

t 

function and * stands for the temporal convolution. In case 
that n = 0, the gamma filter turns to an exponential temporal 
filter. The convolution of the temporal filter T[t) with the 
exponential filter E TS is related to the delay in the appearance 
of the surround temporal filter with respect to the center one. 

The input signal is a still-image which exists for a long 
time, hence f(x,t) = f(x)l[o t00 ](t) where f(x) is the still- 
image and 1 is the indicator function such that l[o i00 ](f) = 
1, if 0 < t < oo, otherwise 0. In this case, we obtain the 
following simplified convolution formula. 


— Rc 



Fig. 1: Temporal filters R c (t) and R s (t). 


Proposition 2.1. For a still-image f(x,t ) = f(x)t[ 0}OO ](t), 
the spatiotemporal convolution (1) turns into a spatial convo- 
lution: 

A(x,t) = (f>(x,t) * f(x), (7) 

where (f>(x,t) is a spatial DoG filter weighted by two temporal 
filters R c (t) and R s (t): 

(/){x,t) = w c R c (t)Gcr c (x) - w s R s (t)G ag (x), ( 8 ) 

R c (t) = [ T(t — t')dt (9) 

Jt '= o 

R.(t)= [ (T * E Ts )(t — t')dt' . (10) 

Jt '= o 



Fig. 2: The non-SPAM filter is a 2D spatially symmetric func- 
tion. This figure shows a transversal cut of its spectrum for 5 
different time samples of R c (t) and R s (t). 

The above proposition is crucial for the reason that it 
enables the simplification and representation of the non- 
SPAM filter like a time-varying DoG. The DoG filters have 


been extensively studied in the past [10, 21, 22], Proposi- 
tion 2.1 shows that the retinal-inspired filter can be modeled 
by a spatial DoG filter which is multiplied by the tempo- 
ral filters Rfit) and R s (t) (Fig. 1), which act like weights 
and modify its spatial spectrum with respect to time (Fig. 
2). The non-SPAM filter is shown in Fig. 2 where the pa- 
rameters have been tuned according to neuroscientic results 
which approximates the retinal spectrum and the speed of 
the retinal processing: tc = 20 . 10 “ 3 ,r s = 4.10 ,tg = 
5 . 10 “ 3 ,W 5 = 1 ,wc = 0.75, < 7 C = 0.5, a s = 1.5. One can 
notice that, after a while, both the temporal filters R s (t) and 
R c (t) converge to the same value, which is established in the 
following proposition. 

Proposition 2.2. The filter (fix, t) is a continuous and in- 
finitely differential function such that <f>(x, 0) = 0 and 

lim <j>( x, t) = 4>{x) 

t — ^ “(“OO 

where (f>(x) is a DoG filter. 

In practice, <f>(x, t) almost converges within a short time 
delay At. Hence, we assume that <j>(x, At) s=s <f>(x) for all x. 
Hence, the non-SPAM filter only evolves during the time in- 
terval [0, At]. The time when this convergence almost occurs 
marks the end of the spatial evolution of the non-SPAM filter. 
In other words, the non-SPAM filter is capable to decompose 
the input image into different spatial subbands and extract in- 
formation for a certain period of time At. After At all the 
necessary information has already been selected (Fig. 3) . 
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Fig. 3: Image decomposition by the non-SPAM filter. 


3. NON-SPAM FRAME 

The goal of this section is to prove that the non-SPAM fil- 
ter is invertible such that we are able to reconstruct the input 
image. For this reason, we establish that the non-SPAM fil- 
ter has a frame structure [22, 23]. For numerical purpose, we 


need to discretize the non-SPAM filter. Let xi , . . . , x n £ M 2 
and £ R + be some sets of spatial and temporal 

sampling points. As a consequence, the continuous spatial 
convolution is approximated by the discrete convolution: 


A(x k ,tj) = 4>{x k ,tj) © f{x k ) 

n 

= -**,**)/(**), (u) 

i— 1 

for all k and j. Let (p k ,j be the row vector of R" defined by 
Tk,j (ffi.'t'k *0 , t j ) , . . . , (f>{x k X n , t j 

and $ be the family of all these vectors: 

* = {<Pk,j } • ( 12 ) 

L J l<k<n,l<j<m 

Let / = (f(x i), . . . , f(x n )) be the discretized image and ||/|| 
its Euclidean norm. Let us denote <f> t . (fi) the discrete Fourier 
transform of the vector {<j){x\ ,tj), . . . , <f>(x n , tj)). 

Proposition 3.1. The family of vectors $ is a frame i.e. there 
exist two scalars 0 < a < ft < oo such that: 


\A(x k ,tj 


< 


(13) 


3 — 1 fc=l 


where 


a = mm < — 
£ In 


. m „ 

{E^>" 


3=1 
m n n 


/? = £££ 4> 2 {x k - Xifij). 
j—1 k—1 i= 1 


The proof of Proposition 3.1 is omitted due to the lack of 
place. 


4. PROGRESSIVE RECONSTRUCTION 

The progressive reconstruction consists in computing an es- 
timate f trn of the discretized image f at time t rn by using a 
limited amount of coefficients. In fact, at time t rn , all the co- 
efficients of the non-SPAM frame has been computed but the 
reconstruction only exploits a small amount of them. We use 
the term progressive because the quality of the reconstruction 
increases as the amount of the coefficients in use increases. 

When the decoder knows the total number of coeffi- 
cients, he can reconstruct perfectly the input signal. Let 
us define A = [A tl , , A tm ] as a vector of size nm and 
$ = [(p i, ... , (j>m } a matrix of size nm x n, where 



A(xi,tj) 


’ <Pl,3 ' 

A t . = 

Z J 


and (f>j = 



A(x n itj') 


(Pn,j 






At time t rn , the estimate f tm is given by 

f tm = ($ T $) _ Vi, 


for a very small percentage of the total amount of coefficients 
(i.e., 40%), the sharpness of the reconstructed image allows 
(15) to recognize the basic structures of the input image. 


where M~ l denotes the inverse of a matrix M and M T de- 
notes its transpose. The dual frame, which is necessary to 
have a perfect decoding at time t m [22, 23], is ($ T <I>) -1< J> T . 
Instead of computing the above matrix operator which can be 
time consuming and resource demanding, we can note that 
(15) is a solution of the following least squares problem: 

(16) 

This minimization problem can be easily solved by using a 
gradient descent algorithm. 

The progressive reconstruction of the decoder is com- 
puted by selecting for each time bin the same percentages 
of spatial coefficients which are used to produce the recon- 
structed image. The decoder sorts in a descending order the 
total amount of coefficients for each time bin, and then it ex- 
tracts the coefficients with the highest energy omitting the rest 
of the data. Obviously, the numerical results show that the re- 
construction is better as the number of extracted coefficients 
is larger. This approach is based on the Rank-Order-Coding 
(ROC) model which is proposed in [24]. 


/ tm =argmin 
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(d) (60%) 

MSE(/,/ t J= 

2.5820e+03 

(e) (80%) 
MSE 
664.4727 

(f) (100%) 
MSE(/,/ tm )= 
0.3884 


Fig. 5: Progressive reconstruction based on ROC model. 


5. RESULTS 

This section gives the numerical results of the progressive de- 
coding introduced in the previous section. The parameters of 
the simulation are given in Fig. 2. The image is composed of 
64 x 64 pixels. We define MSE(/, f trn ) = || / — f tm | | 2 /n the 
mean square error which measures the distortion between the 
original image / and the reconstructed image f tm . 


6. CONCLUSION 

In this document, we have introduced a novel non-separable 
spatiotemporal filter (non-SPAM) based on a retinal model. 
This filter has a time-varying behavior. The progressive re- 
construction exploits the rank-order-coding model to recon- 
struct the image by using a limited number of coefficients. 
For further study, we aim to extend this filtering result on a 
video stream. 



(b) MSE(/,/ t J= 
0.3884 

Fig. 4: Frame-based reconstruction. 

Fig. 4 shows the optimal reconstruction when all the co- 
efficients are used. The progressive reconstruction which is 
based on the ROC model is given in Fig. 5. It is obvious 
that while the number/percentage of coefficients increases, 
the quality of the reconstruction improves. However, even 
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